Search CORE

29 research outputs found

VALICO-UD: annotating an Italian learner corpus

Author: DI NUOVO ELISA
Publication venue: Università degli studi di Genova
Publication date: 10/10/2022
Field of study

Previous work on learner language has highlighted the importance of having annotated resources to describe the development of interlanguage. Despite this, few learner resources, mainly for English L2, feature error and syntactic annotation. This thesis describes the development of a novel parallel learner Italian treebank, VALICO-UD. Its name suggests two main points: where the data comes from—i.e. the corpus VALICO, a collection of non-native Italian texts elicited by comic strips—and what formalism is used for linguistic annotation—i.e. Universal Dependencies (UD) formalism. It is a parallel treebank because the resource provides for each learner sentence (LS) a target hypothesis (TH) (i.e., parallel corrected version written by an Italian native speaker) which is in turn annotated in UD. We developed this treebank to be exploitable for interlanguage research and comparable with the resources employed in Natural Language Processing tasks such as Native Language Identification or Grammatical Error Identification and Correction. VALICO-UD is composed of 237 texts written by English, French, German and Spanish native speakers, which correspond to 2,234 LSs, each associated with a single TH. While all LSs and THs were automatically annotated using UDPipe, only a portion of the treebank made of 398 LSs plus correspondent THs has been manually corrected and released in May 2021 in the UD repository. This core section features also an explicit XML-based annotation of the errors occurring in each sentence. Thus, the treebank is currently organized in two sections: the core gold standard—comprising 398 LSs and their correspondent THs—and the silver standard—consisting of 1,836 LSs and their correspondent THs. In order to contribute to the computational investigation about the peculiar type of texts included in VALICO-UD, this thesis describes the annotation schema of the resource, provides some preliminary tests about the performance of UDPipe models on this treebank, reports on inter-annotator agreement results for both error and linguistic annotation, and suggests some possible applications

Archivio istituzionale della ricerca - Università di Genova

The Italian Dubbing and Subtitling of Monster, Inc.- an Analysis

Author: Di Nuovo Elisa
Publication venue
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

How Good are Humans at Native Language Identification? A Case Study on Italian L2 writings

Author: Bosco Cristina
Corino Elisa
Di Nuovo Elisa
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

In this paper we present a pilot study on human performance for the Native Language Identification task. We performed two tests aimed at exploring the human baseline for the task in which test takers had to identify the writers’ L1 relying only on scripts written in Italian by English, French, German and Spanish native speakers. Then, we conducted an error analysis considering the language background of both test takers and text writers

OpenEdition

Institutional Research Information System University of Turin

EliCoDe at MultiGED2023: fine-tuning XLM-RoBERTa for multilingual grammatical error detection

Author: Colla Davide
Delsanto Matteo
Di Nuovo Elisa
Publication venue: Linköping University Electronic Press
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

VALICO-UD: Treebanking an Italian Learner Corpus in Universal Dependencies

Author: Bosco Cristina
Corino Elisa
DI NUOVO Elisa
Mazzei Alessandro
Sanguinetti Manuela
Publication venue
Publication date: 01/01/2022
Field of study

This article describes an ongoing project for the development of a novel Italian treebank in Universal Dependencies format: VALICO-UD. It consists of texts written by Italian L2 learners of different mother tongues (German, French, Spanish and English) drawn from VALICO, an Italian learner corpus elicited by comic strips. Aiming at building a parallel treebank currently missing for Italian L2, comparable with those exploited in Natural Language Processing tasks, we associated each learner sentence with a target hypothesis (i.e. a corrected version of the learner sentence written by an Italian native speaker), which is in turn annotated in Universal Dependencies. The treebank VALICO-UD is composed of 237 texts written by non-native speakers of Italian (2,234 sentences) and the related target hypotheses, all automatically annotated using UDPipe. A portion of this resource (36 texts corresponding to 398 learner sentences and related target hypotheses)—firstly released on May 2021 in the Universal Dependencies repository—is associated with error annotation and the automatic output is fully manually checked. In this article, we focus especially on the challenges addressed in treebanking a resource composed of learner texts. In addition, we report on a preliminary data exploration that makes use of three quantitative measures for assessing the quality of the data and for better understanding the role that this resource can play in tasks lying at the intersection of Computational Linguistics and learner corpus studies

OpenEdition

Institutional Research Information System University of Turin

Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

Author: DI NUOVO Elisa
Ilenia Tinnirello
Marco La Cascia
Marco Siino
Publication venue
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

The Italian Dubbing and Subtitling of Monster, Inc- An Analysis

Author: Elisa Di Nuovo
Gaia Giaccone
Giorgia Valenti
Publication venue: Lasting Impressions Press
Publication date: 01/06/2018
Field of study

The study attempted the analysis of the Italian dubbing and subtitles of the animated film Monsters, Inc., released in 2001 by Disney Pixar and directed by Pete Docter, Lee Unkrich and David Silverman. The paper is divided into three sections- each one regarding a (extra) linguistic issue. The first one focuses on cultural-specific references (CSRs), which are considered one of the hardest aspects in all types of translation. Dialects and registers are analysed in the second section, while the third one deals with typical phenomena of the spoken language-such as question tags, vocatives and modes of address. For each section, a brief theoretical frame is provided to build the basis to discuss the examples taken from the film (original and dubbed/subtitled version). In addition, the degree of influence (or difference) between the two versions is considered, and some translation strategies are outlined according to the examples shown

Directory of Open Access Journals